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Abstract 

Folding of Ubiquitin (Ub), a functionally important protein found in eukaryotic organisms, is in¬ 
vestigated at low and neutral pH at different temperatures using simulations of the coarse-grained 
Self-Organized-Polymer model with side chains (SOP-SC). The melting temperatures (T^s), identi¬ 
fied with the peaks in the heat capacity curves, decreases as pH decreases, in qualitative agreement 
with experiments. The calculated radius of gyration, showing dramatic variations with pH, is in 
excellent agreement with scattering experiments. At Tm Ub folds in a two-state manner at low 
and neutral pH. Clustering analysis of the conformations sampled in equilibrium folding trajecto¬ 
ries at Tm, with multiple transitions between the folded and unfolded states, show a network of 
metastable states connecting the native and unfolded states. At low and neutral pH, Ub folds with 
high probability through a preferred set of conformations resulting in a pH-dependent dominant 
folding pathway. Folding kinetics reveal that Ub assembly at low pH occurs by multiple pathways 
involving a combination of nucleation-collapse and diffusion collision mechanism. The mechanism 
by which Ub folds is dictated by the stability of the key secondary structural elements responsible 
for establishing long range contacts and collapse of Ub. Nucleation collapse mechanism holds if 
the stability of these elements are marginal, as would be the case at elevated temperatures. If the 
lifetimes associated with these structured microdomains are on the order of hundreds of /xsec then 
Ub folding follows the diffusion-collision mechanism with intermediates many of which coincide 
with those found in equilibrium. Folding at neutral pH is a sequential process with a populated 
intermediate resembling that sampled at equilibrium. The transition state structures, obtained 
using a Pfoid analysis, are homogeneous and globular with most of the secondary and tertiary 
structures being native-like. Many of our findings for both the thermodynamics and kinetics of 
folding are not only in agreement with experiments but also provide missing details not resolvable 
in standard experiments. The key prediction that folding mechanism varies dramatically with pH 
is amenable to experimental tests. 
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Introduction 


Major advances in experiments^’^ and theory^^®, and creation of coarse-grained models 
rooted in theory^°^^^ have produced a comprehensive framework for quantitatively describ¬ 
ing the way single domain proteins fold. More recently, technical advances have made it 
possible to generate long (nearly milli seconds in some instances) folding trajectories using 
atomically detailed simulations in water for several small proteins^^’^®. These developments 
have ushered a new era in protein folding in which it is imperative to develop theoretical 
and computational models so that detailed comparisons with experiments can be made^^. 
Many researchers assume that this task requires all atom simulations using empirical force 
helds. An alternative is to develop coarse-grained (CG) models, which have proven to have 
exceptional predictive power not only in the study of self-assembly of proteins but also in 
understanding larger complexes and biological machines^^’^^’^^. We use this approach, which 
we contend is powerful not to mention computationally tractable, to simulate the folding of 
Ubiquitin (Ub) at low and neutral pH. 

The importance of the 76-residue Ub, a regulatory protein present in eukaryotic organ¬ 
isms, can hardly be overstated. Ub is involved in a large number of functions ranging from 
transcriptional regulation to protein degradation and executes these functions by attaching 
(ubiquitinating) to a number of substrate proteins with great structural diversity. Depend¬ 
ing on the function, monoubiquitation^® and polyubiquitination^®, both of which are post- 
translational modihcations, have been characterized. In addition to the intrinsic interest in 
the folding of this small protein, recent studies have established a link between the stability, 
dynamics, and function of Ub^°. The monomeric native fold of Ub has hve /d-strands, a long 
a-helix, a 3io helix woven together by a complex topology (Fig. lA). The Ca contact-map 
(Fig. SI) illustrates prominent interactions between the residues of (^ 1(^2 hairpin, and long 
range contacts involving strands (di and /ds, and /ds and /ds, and the residues in the loops Li 
and L 2 . Contacts between residues that are distant along the sequence (especially strands 
/di and /ds, /ds and /ds, and loops Li and L 2 ) makes for high contact order, which could be 
the reason for the complex folding kinetics for a moderate sized protein. The overall folding 
itself can be accurately estimated using tf ~ TQexp{^J\N)Y^ where Tq ~ 1/US. For the 
76-residue Ub, rp ^ Q rns, which agrees remarkably well with experiments^^’^^. However, 
accurate theoretical estimates of tf rnay be this by itself does not provide insights into the 


3 



molecular underpinnings of the folding process, which require simulations. 

Here, we explore Ub folding using the Self-Organized Polymer modeH^ with emphasis 
on native interactions, which has been used to study not only protein folding^^ but also 
several other complex processes ranging from bacterial transcription initiation^®, response of 
microtubule to force^^’^®, and indentation of virus particles^®. The model emphasizes native 
interactions based on the structure of the folded state. The role on non-native interactions, 
which has been discussed extensively (see below for additional discussions), has been shown 
to much less dominant in determining the folding of well designed proteins^"^’^®’^^. Various 
aspects of Ub folding have been explored using both atomistic and CG simulations®^^^^. 
Previous CG simulations, based on representation without consideration of electrostatic 
effects have elucidated the slow dynamics in monomeric Ub at low temperatures^®’^® and 
revealed a change in the folding mechanism as the temperature is lowered^®. Because Ub 
folding thermodynamics depends dramatically on pH^^ it is crucial to consider electrostatic 
interactions. Using the SOP model with side chains (SOP-SC) including charge effects, 
we provide a quantitative description of the thermodynamics and kinetics of folding as a 
function of pH, which we mimic by modifying the electrostatic interactions. The simulations 
capture the thermodynamics of Ub folding qualitatively and predict, for the hrst time, the 
dimensions in the unfolded state accurately. Interestingly, we predict that although there is a 
network of connected states linking the folded and unfolded states implying multiple folding 
pathways, there is a dominant folding path along which Ub self-assembles underscoring 
the need for probabilistic description of the folding process. The dominant path is found 
to change dramatically with pH. Our results for folding thermodynamics and kinetics are 
in semi-quantitative agreement with a number of experiments, thus establishing that CG 
models can capture the physics of protein folding. 


Methods 

Self Organized Polymer-Side Chain (SOP-SC) model: We model the polypeptide 
chain using the coarse-grained Self Organized Polymer - Side Chain model (SOP-SC)^"^ in 
which each residue is represented using two interaction centers, one for the backbone atoms 
and the other for the side chains (SCs). Out of the 76 residues in Ub, 23 are charged (Fig. lA 
and SI), which we include in the SOP-SC model (see below). The centers of the interaction 
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centers, representing the backbone atoms and the side chain atoms, are at the atom 
position of the residue, and the center of mass of the side chain, respectively. The SCs 
interact via a residue-dependent Betancourt-Thirumalai statistical potential^^. At low pH 
the acidic residues are protonated, which minimizes the effect of electrostatic interactions on 
the folding of Ub. To simulate folding at neutral pH we added charges by placing them on 
the side chains of the charged residues. The SOP-SC models for Ub are constructed using 
the crystal structure"^® (Protein Data Bank (PDB) ID; lUBQ). 

The energy of a conformation in the SOP-SC models is a sum of bonded (B) and non- 
bonded (NB) interactions. The interaction between a pair of covalently connected beads 
(two successive atoms or SC connected to a Ca atom) is represented by Finite Extensible 
Nonlinear Elastic (FENE) potential. The non-bonded interactions are a sum of native (N) 
and non-native (NN) interactions. If two beads are separated by at least 3 bonds, and if the 
distance between them in the coarse-grained crystal structure is less than a cutoff distance 
Rc (Table SI) then their interactions are considered native. The rest of the pairs of beads, 
not covalently linked or native, are classihed as non-native interactions. 

The force-held in the SOP-SC model is: 


77' _ Z? I I rpNN I \ 7-ie/ 

tjTOT — + ^NB + ^NB + • 


( 1 ) 


The FENE potential, Eb, between covalently linked beads is given by 



where is the total number of bonds between the beads in the coarse grained model of 
the polypeptide chain. For Ub, Nb = 151. 

The non-bonded native interactions, E^bj Eq. 1 is. 



where A’^(=177), A^^(=486), and A^^(=204) are the numbers of backbone-backbone, 
backbone-sidechain, sidechain-sidechain native interactions, respectively, kB is the Boltz¬ 
mann constant, r* is the distance between the pair of residues, and rcry,i is the corre¬ 
sponding distance in the crystal structure. The numbers in the parentheses are for Ub. 
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The strength of interaction between the pair of side chain beads i, is taken from the 
Betancourt-Thirumalai statistical potentiaB^. The values of and are the same as the 
ones used in our previous studies on the folding of GFP^^. Thus, the crucial which 

determines protein stability, is transferable. 

The non-native interactions, E^^, in Eq. 1 is taken to be 



where iVjvAr(=10159 in Ub) is the total number of non-native interactions, iVa„g(=224 in Ub) 
is the number of pair of beads separated by 2 bonds in the SOP-SC model, cTj is the sum 
of the radii of the pair of residues. The radii for side chains of amino acids are given in 
Table S2. The values of the interaction parameters used in the energy function are given in 
Table SI in the SI. 

Because Ub has 23 charged residues (Fig. lA) we expect electrostatic interactions to be 
important at neutral pH. The last term in Eq. 1 accounts for electrostatic effects, which are 
modeled using the screened Coulomb potential. 


Eel 


Afc-l Vc 

E E 


i=l j=(i+l) 


QiQj exp{-nrij) 



( 5 ) 


where Nc is the number of charged residues, g* and qj are the charges on the side chains of 
the and residues respectively, k is the inverse Debye length, and rij is the distance 
between interaction centers located at the centers of mass of side chains i and j. If charges 
are present on two bonded residues, then electrostatic interactions between these residues is 
ignored. The value of measured in unit of electron charge, is -|-1 for positively charged 
residue and is -1 for negatively charged residues. In implicit solvent simulations of proteins 
a range of dielectric constants with values from 2-20 are typically used^^. We used a value of 
lOco (co is vacuum permitivity), which gave a reasonable radius of gyration of the protein in 
the unfolded state. We calculated k assuming a monovalent salt of 10 millimolar is present 
in the solution. The parameter A in Eq. 1 is intended to account for pH effects. At neutral 
pH A = 1.0. In acidic pH, E^’' is not as relevant and hence we set A = 0. At low pH the 
charges on the negatively charged residues are neutralized. Because the positively charged 
residues no longer can engage in salt bridges the polypeptide chain should swell leading to 
an increase in Rg. The residues bearing the positively charged residues would be hydrated. 
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resulting in a reduction in the value of the effective charge (gj). As a result electrostatic 
static repulsion between the like charges would be softened. Given that these charges are 
well separated in Ub it stands to reason that interactions between positively charge would 
not be as strong in the unfolded state as might be naively estimated based on Coulomb’s law. 
So to a hrst approximation, we neglected the small repulsive interaction, as it is likely to be 
a perturbation to the hydrophobic interaction. This approximation is not inconsistent with 
experiments showing that a mutant of S6 in which sixteen charged residues were replaced 
folded (albeit with altered properties)"^® leading the authors to argue that charge interactions 
must not be paramount to folding. 

The parameters in the SOP-SC energy function are given in the Supplementary Informa¬ 
tion (SI). 

The rationale for using native-centric CG models to decipher the folding mechanisms of 
proteins can be traced to several previous computational and theoretical®’^ studies. The 
earliest studies using lattice models®^’®® showed that, for well designed proteins, non-native 
interactions are likely relevant only during the initial stages of folding resulting in the collapse 
of the polypeptide chains®®. These hndings were also corroborated in certain atomic detailed 
simulations in implicit solvents^®. It was also shown that the conformations in the transition 
state ensemble contain predominantly native interactions®®’^®, with non-native interactions 
forming with small probability®®. More recently, analyses based on atomically detailed 
and Go-Go model simulations for the distribution of the fraction of native contacts in the 
transition path were found to be similar®^. These studies reinforce the notion that on time 
scales exceeding the collapse time it is likely that only native interactions direct protein 
folding. 

Simulations and data analysis: Following our earlier studies we used^®’^® low friction 
Langevin dynamics simulations®^ to obtain the thermodynamic properties, and Brownian 
dynamics simulations®® to simulate the folding kinetics (see SI for details). 

Ntot 

We used the structural overlap function®"® y = 1 — ^ 0 (5 — — r®|) to distinguish 

2=1 

between different populated states of the protein . Here, Wot(= 11026) is the number of 
pairs of interaction centers in the SOP-SG model of Ub assuming that the interaction centers 
are separated by at least 2 bonds, rj is the distance between the pair of beads, and r® 
being the corresponding distance in the folded state, 0 is the Heaviside step function, and 
5 = 2A. Examples of folding trajectories at neutral and acidic pH along with the distribution 
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of X are displayed in Fig. S2 in the SI. Using y as an order parameter, we calculated the 
fraction of molecules in the folded and unfolded basins as a function of temperature T (see 
SI for details). The radius of gyration Rg, is calculated using Rg = (l/2iV^)(X) 

Identifying the folding network: In order to determine the network of connected 
states during Ub folding we clustered the conformations of Ub using a structural metric based 
on the Distribution of Reciprocal of Interatomic Distances (DRID)^^ and leader-like cluster¬ 
ing algorithm^®’^^. To evaluate the DRID metric, two sets of atoms are required. The hrst is a 
set of n centroids, and the second is a set of atoms Natom- The centers of the 74 backbone sites 
of the SOP-SC model (out of the 76 backbone beads, the 2 termini backbone sites are omit¬ 
ted) are used for both the centroid set (n = 74), and the distance evaluation set [Natom = 74). 
For each individual centroid i, three moments of distribution of reciprocal distances (/Xj, z/j, 
are used to describe the features of atomic distances in a particular conformation. Hence, a 
conformation is described by a DRID vector of 222 (=3 x n) components. The geometric dis¬ 
tance p, between two conformations described by the DRID vectors (p.*, z/j, ^i) and (p', z/', 

is obtained by p = f (l/3n) X] [(hi ~ + (d — + (6 ~ ■Cz)^] I ■ The moments of 

distribution of the reciprocal distances for centroid i (pz, z/j,^*) are 
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where nbi is the number of atoms in the distance evaluation set bonded to the centroid i, 
the summation is over the set of atoms assigned for distance evaluation, and the asterisk 
in Eq. 6a indicates that the atoms bonded to the centroid i are omitted in the summation. 
Two conformations are assigned to different clusters if the geometric distance p between 
them is greater than a certain cutoff value. 

To identify a suitable cutoff value of p for clustering the conformations, the number of 
clusters Ndust as a function of different cutoff values of p is calculated (Fig. S3). The N^ust 
increases exponentially as the cutoff value for p is decreased (Fig. S3). For the Ub clusters 
in low pH (Fig. 4) we use a cutoff value p = O.OOShA”^ because for this value 8 clusters 



with probabilities greater than 0.01 exist. Based on the secondary structural content in the 8 
clusters, we further coarse-grained them into 5 clusters (Fig. S4). The cumulative probability 
of observing a conformation in one of the 5 clusters exceeds 0.98, which means most of the 
sampled conformations can be uniquely assigned to one of the major clusters. Similar 
procedure is used to cluster Ub conformations in neutral pH (see Fig. 5). The clustering 
analyses were performed for conformations sampled both at equilibrium and during the 
folding process. 


Results and Discussion 

pH-dependent heat capacity: The heat capacity, C^(T) (= , {E(T)) and 

ks are the average internal energy and Boltzmann constant respectively), as a function 
of temperature, T, shows that Ub folds in low and neutral pH in an apparent two-state 
manner (Fig. IB). The melting temperature of Ub in low (neutral) pH is ~ 353iF(354iF). 
These values are in reasonable agreement with experiments^^’^®, which show that varies 
approximately from 320K to > 360K depending on pH. The computed heat heat capacity 
curves are only in qualitative agreement with calorimetric data^"^. The dramatic changes in 
the pH-dependent are not quantitatively reproduced. Most importantly, the full width 
of Cv{T) at half maximum, which changes from ^ 18iF at pH = 2 to about ~ lOiF in pH = 
4 in experiments, is ~ 5iF in simulations (Fig. IB). The calorimetric enthalpy^^’^® estimated 
from the specihc heat data in low (neutral) pH is 142.1(142.5) kcal/mole. These values are 
approximately double the experimental values^^. Interestingly, all atom simulations^^ greatly 
underestimate the calorimetric enthalpy. In general, it is difficult to compute accurately heat 
capacity curves using simulations with any empirical force held. In light of this observation, 
we consider the agreement between simulations and experiments using the same force held 
(meeting the transferable criterion) as in the our previous reports^^’^^ on SH3 and GFP as 
reasonable. 

Temperature dependent ordering of the NBA and secondary structural ele¬ 
ments (SSEs) : The temperature dependence of the fraction of Ub in the NBA, JneaIT), 
shows a cooperative transition to the folded state (Fig. 2A). The melting temperature, de¬ 
termined using fNBAiTm) = 0.5, coincides with the peak position of Cy(T) (Fig. IB). To 
dissect the ordering of the secondary structural elements, SSEs (= /di/32, /di/ds, /ds/ds, L 1 L 2 , 
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ai, 0 : 2 ), we computed fss = (Ngs) /N°^ where (Ngs) is the average number of native contacts 
present in SSE at T, and N°g is the total number of such contacts in the coarse-grained 
PDB structure. The secondary structural elements /di/ds, and L 1 L 2 , which are stabi¬ 
lized by non-local contacts are absent in the unfolded Ub. The rupture of these contacts is 
primarily responsible for the unfolding of the protein at T > in both low and neutral pH 
(Fig. 2). In contrast, ai, 0:2 and P 1 P 2 , stabilized by local contacts, persist even at T > 
and ~ 50% of the contacts between the residues stabilizing these structures are present even 
at T ^ 400K (Fig. 2). Interestingly, snapshots from atomically detailed simulations^^ at 
T > Tm also show (see Fig. 2 in^^) persistence of helical structures. 

pH-dependence of the radius of gyration (Rg)'- Plots of average Rg as a function 
of T show that the dimension of Ub at T >?» 355K in neutral pH is considerably more 
compact than in acidic pH (Fig. 3A). The unfolded state Rg distribution at neutral and 
acidic pH shows conformations with Rg in the range 20A ~ .Rg ~ 40A with the peak of the 
probability distribution at ~ 30A and ~ 23A for high pH and neutral pH respectively (see 
inset in Fig. 3B and C). The average values of Rg in the unfolded state at high temperatures 
are ~ 30A and ~ 25A at low and neutral pH, respectively. These values are in excellent 
agreement with experiments®^^®^, which report a mean Rg of Ub at pH 2.5 and 7.0 are ~ 32A 
and ~ 26A respectively. In neutral pH, dominated by attractive electrostatic interactions, 
Ub samples compact conformations as the centers of mass distance between the secondary 
structural elements P 1 —P 5 , P 3 —P 5 , L 1 — L 2 are in proximity, thus explaining the compactness 
(Fig. S5). In acidic pH, the interaction between these SSEs are not nearly as strong resulting 
in expansion of the polypeptide chain (Fig. S5). 

The {Rg) of the unfolded state of Ub computed from the probability distribution obtained 
from atomistic simulations with a modihed CHARMM22 potential is (Rg) ~ 14.5A (Piana 
et alP^, Fig. SI) compares poorly with the experiments (Fig. 3A). Although Ub samples 
conformations with Rg in the range 12A ^ Rg 33% in these simulations"^^, the peak of 
the probability distribution is between 12 — 13A, resulting in (Rg) ^ 14. 5A. In contrast, 
simulations®^ using the OPLS force held show that Rg of the unfolded protein is in the 
range 12% ~ .Rg ~ 40%, with an estimated {Rg) ~ 21-22% showing that even the sizes 
unfolded states of proteins cannot be computed unambiguously using all atom empirical 
force helds®^. More recently, it has been shown that atomic description produces unusu¬ 
ally compact unfolded states®®. Hydrogen exchange experiments®® conhrm that the current 
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atomistic forcefields sample compact unfolded conformations with persistent native-like sec¬ 
ondary structure due to excessive intramolecular hydrogen bonding. It should be noted that 
recent computations®®’®^ suggest that by tuning the protein-water interactions®® or by using 
a variant of a water model generated by adjusting the dispersion interactions®'^ one can alter 
the dimensions of the unfolded or intrinsically disordered proteins, providing reasonable Rg 
values in better agreement with the experiments. It remains to be ascertained if these hxes 
also produce less compact structures for proteins with native states. 

Hint of a high energy intermediate in the free energy snrface at nentral pH: 
The folding trajectories in Fig. S2 show that at T^, Ub makes a number of cooperative 
transitions between the Native Basin of Attraction (NBA) and Unfolded Basin of Attraction 
(UBA). In acidic pH, such transitions between NBA and UBA in a 2-state manner (Fig. 1C). 
On the other hand, in neutral pH the NBA —)■ UBA involves an intermediate (Fig. ID) 
although its presence is not evident in the specihc heat plot (Fig. IB). Using these folding 
trajectories we constructed the free energy surface, AG{Rg, x) = —ksTm ln(T’(i?g, y)), where 
P{Rg, x) is the joint probability distribution of Rg and y at T^. The AG{Rg, y) prohles 
display two major basins (NBA and UBA) in low and neutral pH conditions (Fig. 1C and D). 
These two basins are separated by a barrier (Fig. 1C and D). In neutral pH the free energy 
surface has an additional high energy basin, which can be associated with an intermediate 
(Fig. ID). The shoulder, corresponding to the intermediate, is on the NBA side. Below we 
show that the shoulder corresponds to the metastable states sampled by Ub in the UBA 
(see below). 

The AG{Rg, y) prohles show that acquisition of the native state occurs only after sub¬ 
stantial compaction of the polypeptide chain (Fig. 1C and D). In both neutral and acidic pH 
the value of y, even after a large decrease in Rg, is relatively high. We infer that folding only 
commences after populating an ensemble of minimum energy compact structures®®, which 
has been explicitly demonstrated for Ub folding using single molecule pulling experiments®®. 

Network of connected states at Tm- In order to assess the complexity of the folding 
thermodynamics of Ub, we performed a clustering analysis (see Methods) at the melting 
temperatures using a 37/rs trajectory at acidic pH and a 5/is trajectory in neutral pH 
(Fig. S2). Even though at T^, the protein in low and neutral pH appears to fold in a 2- 
state like manner (Fig. 1C and D), the clustering analysis reveals that Ub samples prominent 
metastable states where it adopts secondary and tertiary structures to varying degrees (Fig. 4 
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and 5). 

Of the five prominent clusters identified in acidic pH, three are metastable states labeled 
MSl-3 in Fig. 4. In the MSI state, the SSEs /3i/32, /di/ds and L 1 L 2 are formed whereas 
in MS2 only the /3i/S2 hairpin and /Si/ds are present. In the MSS state the hairpin (3il32 
and L 1 L 2 contacts are present. The network of connected states shows that the dominant 
thermodynamic pathway for assembly of Ub is NBA ^ MSI ^ MS2 ^ UBA. Although, 
conformations belonging to MSI and MS2 are sampled in the folding trajectories, globally 
Ub appears as a 2-state folder because the lifetimes of the MSI and MS2 states are small at 
T^. (Fig. IB, S2). 

In neutral pH, the reversible pathway between UBA and NBA involves MSI and MSS 
states (Fig. 5). Electrostatic interactions destabilize the MS2 state, and hence is not sampled 
in the folding pathways. The MSI state has a long enough lifetime in neutral pH that it is 
discernible as a high energy intermediate in the free energy surface in Fig. ID. The dominant 
Ub folding pathway in neutral pH connects the states NBA ^ MAI ^ MAS ^ UBA, 
which is different from the dominant pathway in low pH. This is the hrst indication that the 
folding mechanism depends on pH, which we demonstrate below using kinetic simulations. 
The MSS, state, which is not a part of the low pH dominant folding pathway (Fig. 4), is 
stabilized in neutral pH by favorable interactions among the charged residues at the interface 
of the SSEs Li, L 2 and ai (Fig. SI). Interactions associated with these structural elements 
play a key role in Ub folding close to neutral pH (see below). 

The thermodynamic pathway can be compared to experiments. Based on experiments^®’^^ 
at pH 7.5 and T-analysis two folding pathways for Ub were inferred. In the major folding 
pathway, the hairpin I3il32 forms, then the helix ai stacks onto (^ 1(^2 forming tertiary contacts. 
Subsequently sheet (3^ interacts with /3i forming the /di/ds contacts. The dominant pathway 
inferred from experiments on the mutant F45W agrees partially with our predictions in 
neutral pH. Our analysis (Fig. 5) reveals that tertiary contacts exist only between /di/32 
strands. We hnd that the contacts between ai and /Si/d 2 alone are not stable. We suggest 
that the next step in the assembly is the formation of contacts between ai, Li and L 2 due to 
the charged residues (Fig. SI), giving rise to transient population of the MSS state. Finally, 
sheet /ds interacts with /3i enabling the formation of the rest of the tertiary contacts. In such 
compact intermediates where L 1 L 2 and /di/ds contacts are formed we also observe interactions 
between ai and /di/32 in some of the Ub folding trajectories in neutral pH conditions (see 
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below and structure 13 in Fig. 8). 

The dominant pathway identihed using simulations in low pH (Fig. 4) agrees with the 
minor pathway inferred from the experiments^®’^^ at pH 7.5. In this pathway, the hairpin 
/3 i/ 32 forms hrst, then the sheet (3^ interacts with /9i forming the /Si/ds contacts (Fig. 4, 
clusters MSI and MS2). Subsequently, helix ai forms tertiary contacts with sheets /di, /d 2 
and /ds. The UBA cluster is similar to the structure f/1 in the atomistic simulations^^ where 
the contacts between the sheets /di/d 2 are present. The conformations of Ub in the MSI and 
MS2 states with contacts between /di/d 2 , /di/ds and L 1 L 2 are similar to the cluster Ul' in the 
atomistic simulations. 

Temperature and pH dependence of folding mechanism: We generated at least 
hfty folding Brownian dynamics simulation trajectories in low and neutral pH at two temper¬ 
atures to probe the dynamics of Ub folding. The simulations are initiated from an ensemble 
of unfolded conformations generated at temperatures T > T^. Regardless of the tempera¬ 
ture or pH the SSEs, oi, 0 : 2 , and /di/d 2 hairpin are the hrst structural elements to form. The 
variations in the self-assembly of Ub occur only in the subsequent stages. 

Low pH: At T = 30077 Ub folds along four pathways, which we illustrate using one of the 
folding trajectories (Fig. 6A and 7). In all the pathways the SSEs ai, 0 : 2 , and the hairpin 
/di/d 2 are always present due to their stability. Subsequently there is a bifurcation in the 
folding pathways. Using energy per residue as a reporter of folding, we hnd that in the 
pathways KIN2 and KIN3, one intermediate is populated prior to reaching the folded state 
whereas there is no persistent intermediate in KINl (Fig. 6). The observed bifurcation in 
the folding pathways, where one route involves a direct UBA —)■ NBA transition while in 
the rest NBA is reached in stages, is the hall mark of the kinetic partitioning mechanism 
(KPM)^^^^^. The two intermediates are structurally different (Fig. 6A, II and 12). In KIN4, 
both the intermediates (Fig. 6, II and 12) are sampled whereas folding in KIN2, only II is 
sampled and 12 is accessed in KIN3. II is stabilized by the secondary structural elements 
/di/S 2 and L 1 L 2 , whereas 12 is stabilized by the contacts between /9i/32, L 1 L 2 and /ds/^s (Fig. 7). 
There are similarities between II and the MS3 state (Fig. 4) identihed in the equilibrium 
simulations. 

In KINl, where Ub appears to fold in a 2-state like manner (Fig. 6A), the contacts stabi¬ 
lizing the secondary structures L 1 L 2 , /di/ds and /ds/ls form almost simultaneously (Fig. 7A). 
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They assemble successively separated by a time ~ 15/is (Fig. 7A), which is only a fraction 
of the hrst passage time. The interactions among the non-local SSEs needed to stabilize the 
compact states, and collapse of Ub measure by decrease in Rg{t) occur nearly simultaneously 
(Fig. 7A). Folding along this pathway thus follows the nucleation-collapse (NC) mechanism. 

In KIN2 and KIN4, the L 1 L 2 contacts form ahead of /di/ds and leading to II (Fig. 7B 
and 7D). The kinetic intermediate 12 is formed when contacts L 1 L 2 and are established 
simultaneously or successively as observed in KINS and KIN4 respectively (Fig. 7C and 
7D). In the other pathways intermediates with well-dehned structures form (Fig. 7B and 
7C). These hgures also show that Rg{t) decrease continues even after some of the non-local 
SSEs form. The route to the native state in KIN4 is via both the intermediates found in 
KIN2 and KINS (Fig. 7D). The presence of a nearly direct transition to the native state 
in KINland folding through the intermediates in the other pathways is in accord with the 
kinetic partitioning mechanism (KPM). 

The assembly of native Ub along the KIN2, KINS, and KIN4 pathways at T = SOOiF 
can be rationalized by the diffusion-collision mechanism (DCM)^^. In the nascent stages 
of folding microdomains (for example (^ 1(^2 and /di/ds sheet in IS) form which diffuse freely. 
Subsequently, some of these collide and coalesce to form the kinetic intermediates with 
lifetimes on the order of ~ 100/is. In the hnal stages of folding, the rest of the secondary 
structural elements collide with the core of the protein structure, and coalesce to form the 
native structure (see below for further discussion). 

At the higher temperature T = 332K Ub folds by the KPM (Fig. 6B). In KINl, the 
protein folds in a 2-state manner, and in KIN2 a single intermediate (Fig. 6B) whose struc¬ 
ture is different from the ones observed at T = 30077 is populated. This intermediate^*’’^^, 
stabilized by the SSEs /9i/d2, L 1 L 2 , and /di/ds (Fig. 6, 13), is similar to the MSI state at T^, 
thus providing evidence that Ub samples the equilibrium structures in the process of folding 
(see below for additional discussion). By analyzing the formation of individual secondary 
structures (Fig. S6) we hnd that at the higher temperature, the contacts which stabilize 
L 1 L 2 leading to SI (MS3 cluster) are unstable (Fig. S6) in contrast to what is observed at 
T = 30077 (Fig. 7B and 7D). The L 1 L 2 and /3i/3z contacts are stabilized simultaneously at 
T = 33277 (Fig. S6) for timescales on the order of ~ lOO/xs leading to 13. At higher temper¬ 
atures the contacts between the /9-sheets /di/ds are important in stabilizing the intermediate, 
as these are the end-to-end contacts in Ub. These interactions minimize the conforma- 
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tional fluctuations leading to lifetimes that are long enough for S3 to form (Fig. 6B). At 
T = 332K, a 2-state like folding pathway is observed when /ds/ds contacts immediately form 
after structuring of L 1 L 2 and /di/ds contacts (Fig. S6A). 

Neutral pH: Folding at neutral pH, where electrostatic interactions play a major role, is 
dramatically different. In this case, both at T = 335K and 300iF a well populated interme¬ 
diate is observed (MSI in Fig. 5, and 12 in Fig. 8). In particular, due to the electrostatic 
interactions between residues in Li and L 2 , the L 1 L 2 loop formation precedes the formation 
of the 12 intermediate (Fig. S7). The assembly of Ub occurs by the DCM where preformed 
micro-domains collide. The intermediate 12 can further become compact due to favorable 
electrostatic interactions between the helix ai and /32 strand leading to 13 (Fig. 8 and S8). 
This intermediate is not observed in high pH folding. The hnal stage of compaction is 
delayed until the micro domains adopt native-like topology (Fig. S7). 

Comparison with experiments: It should be borne in mind that there are discrep¬ 
ancies in the interpretations of the different experimental data, which makes it difficult to 
make direct comparisons with a single experiment. With this caveat, we note that our 
hndings, which are by and large consistent with experiments^^’^^’’^®^®’^, provide a complete 
picture of the structures sampled by Ub during folding. However, experiments have only 
characterized a subset of the predicted intermediates. In accord with the present hndings, 
experiments inferred that Ub folds through an intermediate at low temperatures or mildly 
denaturing conditions or when mutations slow down folding. A common characteristic in 
all the intermediates, regardless of pH or temperature, is that the (^ 1(^2 hairpin is stable 
for which there is substantial experimental evidence^^’^®’®®. The II intermediate found here 
rationalizes experimental studies^®’®^’®®, which provide evidence for a stable ai helix and 
unstable /ds, (3^ and (3^ strands. In addition protein vivisection suggested®® an intermediate 
structure of Ub ubiquitin in which the small [3^ (Fig lA) strand is unstructured. (Fig. 6 and 
8B). Taken together we conclude that our simulations provide a complete structure of the 
populated intermediates Filing in gaps in experimental studies. 

The environment-dependent complex folding pathways are captured in Fig. 10. The 
folding mechanisms, involving characterization of the network of connected intermediate 
structures and transitions between them, are vastly different if the external conditions are 
altered. The most general characteristic is that folding is a stochastic process in which 
assembly occurs by multiple pathways. Depending on the conditions the hux through these 
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pathways can be altered and as demonstrated here one can even a single dominant pathway 
for folding. Validation of our predictions requires experiments probing folding kinetics as 
function of pH and temperature. 

Coincidence of equilibrium and kinetic intermediates: The structures of the II 
kinetic intermediates in low pH at T = 300i^, and 13 at T = 332K (Fig. 6) are similar 
to those observed in the MSI and MS3 states (Fig. 4). The dominant folding/unfolding 
pathway identihed at (Fig. 4) is very similar to the KIN2 folding pathway at T = 335K. 
In neutral pH the dominant folding pathway (Fig. 5) is identical to the folding pathways at 
both T = 300i^ and 335K. 

The coincidence of equilibrium and kinetic intermediates is not without precedence. A se¬ 
ries of insightful NMR experiments have established that during the folding of apomyoglobin 
a kinetic intermediate is populated®^ that has the same structure as the one characterized 
at equilibrium^F Our study leads to the prediction that the network of states accessed ki- 
netically are also found in folding trajectories at equilibrium. This prediction can be tested 
using NMR experiments as a function of pH just was done for apomyoglobin. 

Identification of the Transition State Ensemble (TSE) using Pfoid- The transition 
state structures of Ub are identihed from the folding trajectory at Tm (Fig. S2). Putative 
transition state structures are picked from the saddle-point region of the free energy projected 
onto the variables E and y using the conditions —50.Okcal/mole < E < —AO.Okcal/mole 
and 0.66 < y < 0.68. Starting from these structures we calculated the commitment proba¬ 
bility, Pfoid^"^, of reaching the NBA. The set of structures with Pfoid ~ .4 — .6 is identihed 
as the transition state ensemble (TSE) (Fig. 9C, S9). To our knowledge this is the hrst 
demonstration that TSE has been quantitatively identihed for Ub without any prejudice 
about the underlying reaction coordinate. 

The average TSE structure is globular with most of the SSEs and tertiary contacts intact 
as in the folded state, and they are fairly homogeneous (Fig. 9A) supporting the conclusions 
based on T-value analysis™’^^ and all atom simulations^^. The average y, an estimate of 
the contact-order in the TSE, is approximately 0.67 in agreement with reported values for 
various proteins®^. The /di/32 hairpin and ai are fully structured in the TSE which is not 
surprising given their thermodynamic stability (Fig. 3B). Compared to the folded structure 
the contacts between the /d-sheets /ds/ds are absent in the TSE, although the strands /da and 
/ds are fully structured (Fig. 9B). The formation of a compact TSE in which majority of the 
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SSE and tertiary interactions are consolidated further supports that at T ^ Ub folds by 
the NC mechanism. 

The TSE structures identihed in the simulations are in reasonable agreement with the 
inferences drawn from the <h-value analysis®*^, and are in better agreement with the T-value 
analysis’^^’®® and T-jump infrared spectroscopy experiments^^. Based on these studies'^^’®"^’®® 
it is suggested that the N-terminal part of the protein, helix ai and sheet /di/32, are ordered 
in the TSE, which agrees with our simulations (Fig. 9A). However, the experiments disagree 
among each other on the TSE structure in the C-terminal region of the protein. The <h-value 
analysis®^ suggest that the C-terminal region of the protein is unfolded while the T-value 
analysis^^’®^ and T-jump infrared spectroscopy experiments^^ infer the opposite. According 
to these experiments the transition state is extensively ordered with structure comprising the 
four /d-strands and the a-helix. Our simulations show that the /ds does make contacts with 
/3i in the TSE in agreement with the experiments based on T-value analysis. In addition, 
the Pfoid analysis shows that the C-terminus a-helix is at least partially structured. 

A picture of the TSE using all atom MD simulations in water and projection onto a 
one-dimensional reaction coordinate was proposed^^. Using the dynamics in the projected 
coordinate they computed <h-values for only hydrophobic residues using certain (untested) 
assumptions. The trends (not absolute values) in experiments and simulations are similar^^. 
On this basis they asserted that the structures in the barrier region in the one-dimensional 
coordinate is the TSE. Because of the completely different methods and the models used 
(our TSE is most appropriate for acidic pH) in the two studies it is difficult to directly 
compare the p/oid-based determination of the TSE with the one from^^. Nevertheless, in 
both the studies TSEs are homogeneous, compact, and native-like. 

Relevance of non-native interactions: We provide generic arguments showing that 
non-native interactions ought to play only a sub-dominant role in the folding of evolved 
small proteins that ostensibly fold in an apparent two-state manner. In order to keep the 
arguments simple let us assume that non-native interactions largely affect the unfolded state. 
There is anecdotal evidence that this is the case in a mutant of NTL9®®. We write the free 
energy difference between the unfolded states containing non-native (NN) interactions and 
one described using only native (N) interactions as AAGu = AHjj — T{ASu) where AHu 
{ASu) is the enthalpy (entropy) changes between the NN and N models of the unfolded state. 
Typically, but not always, we expect that NN interactions ought to stabilize the unfolded 
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state compared to the unfolded state described by the N model. However, AHu < 0 also 
implies T{ASu) will be negative because certain conformations formed by favorable NN 
interactions are disallowed in the native interaction dominated model. Thus, the sign of 
AAGu is determined by the magnitude of T{ASu), which cannot be too large to negate 
AHu. If if were the case then NN interactions alone would stabilize the folded states to a 
greater extent than N interactions, which is unlikely. So we will assume that I t^asu) I ^ 

Because hydrophobic interactions between small hydrophobic species is entropic in origin 
we expect that non-native interactions are most relevant when a salt bridge, not present 
in the folded state, can form in the unfolded state, and hence may be important in Ub 
folding around neutral pH. It is clear that 1 where AG mu is fhe stability of the 

native state with respect to the unfolded state. If this inequality is violated then the protein 
would not fold! Thus, it follows that NN interactions are most likely to be perturbation and 
not a dominant determinant of the folding thermodynamics. This conclusion is increasingly 
valid for proteins whose native states are highly stable. A corollary of this argument is that 
folding rates, and in most cases unfolding rates as well, are unlikely to change signihcantly 
(less than an order of magnitude). Our previous works on lattice models, which treated N 
and NN interactions on equal footing, provide illustrations of the arguments provided here. 

Do the arguments given above imply that NN interactions®’^ are not relevant at all? We 
discuss two examples suggesting that favorable NN interactions may affect the stability and 
kinetics of folding. (1) Mutations of surface exposed charged residues in Fyn SH3 domain, a 
small protein (?« 56 residues) showed the folding rates, with respect to the WT, increases by 
a factor of ~ 3 for the E46K mutant and by a factor of ~ 8 for mutant for the E46K-E11K- 
D16K-H21K-N30K (Fyn5) mutant (see Table I in®®). The unfolding rate decreases only by a 
factor of two for these mutants, which can be taken to mean that the effects of the dramatic 
mutations affect largely the unfolded states. Even though these mutations, especially the 
Fyn5 construct, are drastic the effect on the folding kinetics is modest and the factor of eight 
increase can be accounted for by ~ 2 KbT change in the barrier height, which mirrors roughly 
the enhanced stability of the Fyn5 mutant (see Table I in®®). The relatively small changes 
(less than a factor of ten even in Fyn5 mutant) in the rates are in accord with the arguments 
given above. To explain these changes coarse-grained simulations were performed using Ga 
representation of the protein. We do not believe the explanation based on these simulations is 
adequate for two reasons. First, in these simulations included only NN interactions between 
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charged residues using the Debye-Huckel potential. The parameters of the NN electrostatic 
interactions are very different from those used for native electrostatic interactions. In other 
words, NN and N interactions are not treated on equal footing. Second, the inferences that 
non-native interactions might be present in the transition state were made based on free 
energy profiles computed using the fraction of native contacts without the beneht of Pfoid 
analysis. (2) In a more compelling case, it has been shown in a number of NTL9 mutants the 
unfolded state may be stabilized by a non-native salt bridge®®. The most dramatic change 
occurs in experiments at pH 5.5 in the K12M mutant (a rather large perturbation) in which 
the free energy of the mutant is enhanced by ~ SKgT (a 50% increase) relative to the WT, 
with all other mutants exhibiting less than KbT change (see Table 1 in®®). Apparently in 
the WT a salt bridge forms between D8K12 in the unfolded state, which is clearly abolished 
in the mutant, thus increasing the free energy of the unfolded state. The result is the K12M 
is more stable than the WT. These results were explained by using CG simulations in which 
Debye-Huckel potential was used with NN interactions only between charged residues®®. In 
this study NN and N electrostatic interactions were treated on equal footing. The hndings 
corroborate the experimental observations. 

We draw two conclusions fro the discussion presented here. First, only in dramatically 
altered sequences NN interactions are signihcant in affecting the stability. Second, these 
changes involving charged residues can be taken into account within CG models by slightly 
altering the strength of hydrophobic interactions. After all large perturbations in both Fyn 
SH3 and NTL9 can modulate both electrostatic as well hydrophobic interactions because one 
expects changes in hydration in the unfolded states as a consequence of these mutations. 
Based on the current evidence from simulations^^’^®’^^, we conclude that generically non¬ 
native interactions are likely to be only a perturbation and not a dominant factor in the 
folding of small single domain proteins. This conclusion is in accord with the arguments 
given above and theoretical considerations^. 

Conclusions: 

In summary, using coarse-grained models and molecular dynamics simulations we have 
dissected the folding of Ub as a function of temperature at acidic and neutral pH. The major 
findings in this work are: (1) We predict quantitively the pH-dependent changes in the radius 
of gyration of Ub. The values of mean Rg at high temperatures are in excellent agreement 
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with experiments. (2) A significant prediction of our study is that the folding pathways 
can be dramatically altered by changing pH. The major pathway at low pH resembles the 
minor pathway at neutral pH. The structures of some of the intermediates and transition 
states, which are only indirectly inferred from experiments, are fully resolved. (3) Our work 
also highlights the balance between the number of local and non-local contacts determines 
the folding mechanisms of protein folding in generah°°. In the context of Ub, the secondary 
structural elements stabilized by local contacts form in the early stages of the folding process. 
Only subsequently and after considerable compaction, secondary structures, stabilized by 
non-local contacts, form. The formation of these non-local contacts determine the folding 
rates and are strongly influenced by the folding conditions. (4) Although there are dominant 
folding pathways under all conditions for folding in general, and Ub in particular, self- 
assembly can occur by alternate sub-dominant routes. Thus, the assembly mechanism of 
proteins should be described in probabilistic terms - a notion that appears naturally in the 
statistical mechanical description of folding^’^^. In accord with this general principle, we 
find that the folding mechanism is complex especially at acidic pH. Under these conditions, 
a fraction of molecules folds by a nucleation-collapse mechanism where as in others long 
lived meta-stable intermediates are populated prior to collapse and the formation of the 
native-state. This finding is in line with the KPM^^, which is now firmly established for a 
number of proteins^^’^^. At neutral pH, Ub folds by a sequential mechanism in which local 
SSEs first form. Subsequently an intermediate stabilized by long range contacts (T 1 T 2 and 
/di/ds) is populated prior to the formation of the native-state. 

An important enterprise in molecular simulations is to benchmark forcefields, which 
should be done by comparing simulations and experiments. Minimally such comparisons 
should include specific heat profiles, dependence of the dimensions {Rgs) of the protein as a 
function of temperature and denaturants, and time dependent changes in Rg and other 
measurable properties probing the kinetics of self-assembly. We hasten to add that it 
is almost impossible to calculate accurately (nor should one attempt such computations) 
material-dependent properties (specific heat being one example) using simulations with ad 
hoc empirical force-fields including CG models or atomically detailed models. For purposes 
of direct comparisons with experiments it is prudent to create transferable CG models^'^’^^^ 
by benchmarking against experiments. The transferable CG force-field we have created has 
been remarkably successful in semi-quantitatively reproducing many experimental quanti- 
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ties for srcSH3^^ and GFP^^ as a function of denaturants. Such simulations are currently 
beyond the scope of atomic detailed simulations because of lack of reasonable force helds for 
denaturants and the sheer size of GFP. 

Despite the ability to reproduce experimental measurements and make testable predic¬ 
tions for a large number of proteins using coarse-grained models they have obvious lim¬ 
itations. The absence of explicit inclusion of the solvent, which has an impact on the 
fluctuations of the unfolded state, makes it difficult to quantitively reproduce the measured 
heat capacity curves. Finally the knowledge of the native structure needed in these sim¬ 
ulations can be legitimately criticized. Despite these reservations the potential utility of 
coarse-grained models in protein and RNA folding is substantiaF'^. Most importantly, such 
simulations can be carried out using standard desktop computers. 
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FIG. 1: Thermodynamics of Ub folding, (a) Ribbon diagram^^^ of the crystal structure of Ub 
(PDB ID: lUBQ). The strands are labeled /3i, (32, Ps, Pa and /^s, helix is ai, 3io helix is a 2 , and 
the interacting loops are Li and L 2 . The sequence of Ub is given below. Positively charged residues 
are in green and negatively charged residues are in red. (B) Heat Capacity Cy as a function of 
temperature T at low pH (empty squares) and neutral pH (solid squares). For comparison we 
show the experimental data^^ in blue for heat capacity at pH= 3.0. Although the calculated 
deviates only by a few degrees from experiments the simulations do not capture the measured peak 
height and the width of the heat capacity curve. (C) The free energy surface of Ub at low pH and 
Tm = 353 K projected onto the radius of gyration Rg, and structural overlap parameter x, shows 
2-state behavior. (D) Same as (C) except the free energy profiles correspond to folding at neutral 
pH and = 354iF. The circle highlights a high energy intermediate. 
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FIG. 2; (A) Fraction of the protein folded, /nba in low pH (empty squares) and neutral pH (solid 
squares) as a function of temperature, T. Fraction of the various secondary structural elements, 
fss as a function of temperature T in (B) low pH and (C) neutral pH. The secondary structures ai, 
Q! 2 , and /3 i/ 32 (Fig. lA) stabilized by local contacts do not completely unfold even at temperatures 
above T^. Ub unfolds upon rupture of non-local tertiary interactions involving ^13^ and 

L1L2. 
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FIG. 3: (A) Rg as a function of T for low pH (empty squares) and neutral pH (solid squares) 
calculated from SOP-SC simulations. Arrows correspond to (Rg) of Ub in the unfolded state 
from atomistic simulations^^ using modified CHARMM22 (black inverted triangle) and OPLS 
forcefields®^ (red triangle) which are ~ 14. SA and ~ 22A respectively. Probability distribution of 
Rg (B) low pH at T=343K (red), T=353K(green), T=363K(black) and (C) neutral pH for T=325K 
(red), T=353K(green), T=375K(black). Inset in (B) and (C) shows Rg distribution in the unfolded 
state. 
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FIG. 4: Network of connected states obtained using a clustering analysis of the folding trajectories 
in low pH at T ~ Tm (=353K). There are three metastable clusters labeled MSl-3, which are 
considered equilibrium intermediates. A representative conformation with the secondary structural 
elements from each cluster is shown. Dark arrows correspond to the dominant pathway and the 
grey arrows show possible subdominant routes connecting NBA and UBA. Interestingly, MS3 is 
not connected to NBA. The numbers show the number of transitions between the clusters in the 
37//S trajectory. 
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FIG. 5: Network of connected states obtained using a clustering analysis of the folding trajectories 
in neutral pH at T ~ Tm (=355K) reveal 4 clusters. In addition to the NBA and UBA, there is a 
intermediate cluster, MSI and a metastable cluster MSS. A representative conformation with the 
secondary structural elements from each cluster is shown. The numbers on the arrows show the 
number of transitions between the clusters in the 5/US trajectory. 
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FIG. 6: Ub folding kinetics in low pH. The folding pathways inferred from change in energy per 
residue as a function of time, t at (A) T = 300K and (B) T = 332K. Two kinetic intermediates 
II and 12 are identified in the folding pathways at T=300K, and the intermediate 13 is populated 
at T = 332K. Representative structures of the kinetic intermediates are on the right. 
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FIG. 7: Fraction of native contacts in various secondary structural elements /33/35, and 

L 1 L 2 ), fss and Rg as a function of time in low pH folding trajectories at T = 300K. The plots in 
four panels are for trajectories labeled KINl, KIN2, KINS and KIN4 in Fig. 6A. The green, blue 
and magenta colors correspond to secondary structures L 1 L 2 , /Si/^s, and /Sa/ls, respectively. Radius 
of gyration, Rg as a function of time is shown in black. 
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FIG. 8: Folding kinetics in neutral pH at T = 300K and 335K. Three intermediates are populated 
in the folding trajectories and representative structures of the intermediates labeled II, 12 and 13 
are shown on the right. 
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FIG. 9: (A) Two superimposed representative structures from the transition state ensemble at low 
pH. (B) The upper diagonal of the plot shows Ca contact-map of the transition state ensemble. 
There are no interactions between (shown in circle). The lower diagonal of the plot shows 
experimental™ T-values. (C) The distribution of the final structural overlap parameter, Xi of 
least 500 simulation trajectories spawned from the transition state structures. Data is shown for 
five different structures. The distribution shows that roughly half of these trajectories go to the 
folded basin and the other half reach unfolded basin {pfoid ~ 0.5). 
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FIG. 10: Complex folding pH and temperature-dependent folding pathways for Ub. The first 
index in [LjTi; a] denotes low pH, the second is temperature, and a = 1, 2, 3, 4 labels the kinetic 
pathways. A similar interpretation with T 2 > Ti holds for [H,r 2 ; a] with H standing for neutral pH. 
In the unfolded state secondary structures stabilized by local contacts (ai, 02 , /3i/32) are always 
present. The transition from UBA to NBA in [L,Ti,l] and [L,r 2 ,l] (green arrow) is described by the 
nucleation-collapse (NC) mechanism. Blue, red, and black arrows routes to the NBA from UBA 
at Ti through well-dehned intermediates whose structures are displayed. The high temperature 
route through an intermediate is shown by lavender arrows. The folding pathway at neutral pH at 
both temperatures is in aqua blue. Three of the four intermediates (the left two and the one on 
the right) are also populated in equilibrium folding trajectories. 
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SI Text 


Simulations: In order to sample the conformations of the polypeptide chain efficiently we 
used low friction Langevin dynamics simulations [1] to calculate the thermodynamic properties. 
The equation of motion used in the Langevin dynamics simulations is mTy = —+ T, 
where m is the mass of the protein beads, ( is the friction coefficient, fl is the position of the 
bead i, Fc = , T is the random force with a white noise spectrum. In the discretized 

form the autocorrelation function of the random force is (r(t) T{t + nh)) = where 

n = 0,1,... and (5o,n is the Kronecker delta function. The Langevin equation is integrated using 
the velocity Verlet algorithm[l, 2]. We used ( = 0.05m/TL and h = O.OOSri, where tl is the 
unit of time to compute the thermodynamic properties. 

To provide a realistic description of the folding kinetics we performed Brownian dynamics sim¬ 
ulations and the equations of motion are integrated using the Ermak-McCammon algorithm[3], 
Fi(t + h) = Fi(t) + + T. Here T is a random force with a Gaussian distribution with mean 

zero and variance (r(h)^) = TPe friction coefficient ( = bOm/rn approximately corre¬ 

sponds to the value in water and h = O.OOSth. In the simulations, the characteristic unit of 
length a = lA, energy e = Ikcal/mole, and mass m = 1.8 x (typical mass of the bead). 

The unit of time in Langevin dynamics simulations is tl{= ^/mcFje) = O.Slps. In Brownian 
dynamics, simulation time is mapped into real time, th using th ~ ~ 43ps. 
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TABLE SI; Parameters for the SOP-Side Chain model 


Parameters 

Protein 

Ro 

2.0 

k 

20 kcal/(mol. A^) 

Rc 

8 A 

^bb a 

0.45 kcal/mole 

^bs a 

0.45 kcal/mole 


1.0 kcal/mole 

^bb 

3.8 A 

e 

10.0 


“Values are chosen such that the protein melting temperature in simulations is 
with the experiments [5, 6] 


approximately in agreement 
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FIG. SI: (A) Crystal structure of Ub (PDB ID: lUBQ). The charged residues, shown at the interface 
of the helix ai and loops Li and L 2 , stabilize the intermediate in the folding pathways of Ub in neutral 
pH. (B) The Cq contact map of Ub shows contacts between various secondary structural elements in 
the folded state. 
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FIG. S2: (A) The structural overlap function[4], x, as a function of time for Ub at low pH at = 353K. 
Ub undergoes multiple transitions between the UBA and NBA basins. (B) The probability distribution 
of X, P{x) at Tm- The arrow is the value of x = 0.67 separating the UBA and NBA basins. (C) x as a 
function of time for Ub at neutral pH at Tm ~ 355K. (D) The probability distribution of x, P{x) at 
Tm in neutral pH. The arrow at x = 0.68 separates the UBA and NBA. 
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FIG. S3: The number of clusters Ndust, in the Ub folding trajectory (shown in Fig. S2A) at low pH 
and Tm = 353K as a function of the cutoff for the geometric distance p (defined in the main text) 
between the conformations. The number of conformations increase exponentially as p is decreased. 
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FIG. S4: The probability distribution of the clusters identified in the protein folding trajectory for Ub 
at low pH and Tm = 353K (Fig. S2A) using p = 0.055 A“^. Out of the 187 clusters identihed, the 8 
clusters (1, 6-12) have probabilities greater than 0.01. The cumulative probability of these 8 clusters is 
greater than 0.98. The 8 clusters based on the secondary structural features are further coarse-grained 
into 5 clusters, NBA (1), MSI (6), MS2 (7), MS3(8), UBA(9-12). 















FIG. S5: Probability distribution of the center of mass distance, P{r), between the secondary structural 
elements at T = 400K. (A) /3i (residues 1 to 9) and /35 (residues 66 to 72) (B) /Js (residues 41 to 46) 
and /35 (residues 66 to 72), (C) ai (residues 20 to 27) and L 2 (residues 50 to 58). The data in red and 
blue represent neutral and low pH, respectively. 
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FIG. S6: Fraction of native contacts in various secondary structural elements (,0i/35, and L 1 L 2 ), 

fss and Rg as a function of time in low pH trajectories folding at T = 332K. The plots in the two panels 
are for trajectories labeled KINl and KIN2 in Fig. 6B. Green, blue and magenta colors correspond to 
secondary structures L 1 L 2 , and respectively. Rg as function of time is in black 
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FIG. S7: Fraction of native contacts in various secondary structural elements /33/35, and L 1 L 2 ), 

fss and Rg as a function of time in neutral pH. The plots are for trajectories shown in Fig. 8. (A) 
T = 300K, KINl (B) T = 300K, KIN2 (C) T = 335K, KINl (D) T = 335K, KIN2 . 
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FIG. S8: Ca contact map of the intermediate 13 (Fig. 8) observed in the Ub folding trajectories at 
neutral pH and T = 335K (Fig. 8). The contact map shows the contacts between the helix ai and 
sheet /32- 
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FIG. S9; The structural overlap function, x as a function of time, t, for two different trajectories 
spawned from one of the transition state structures of Ub at low pH with = 353K (Fig. 9). 
Trajectory in red and black end up in the NBA and UBA respectively. 


13 











TABLE S2: Sidechain and backbone radii of amino acids based on partial molar volumes 


Residue Radius (A) 


Gly 

0 

Ala 

2.52 

Val 

2.93 

Leu 

3.09 

lie 

3.09 

Met 

3.09 

Phe 

3.18 

Pro 

2.78 

Ser 

2.59 

Thr 

2.81 

Asn 

2.84 

Gin 

3.01 

Tyr 

3.23 

Trp 

3.39 

Asp 

2.79 

Glu 

2.96 

Hse^ 

3.04 

Hsd 

3.04 

Lys 

3.18 

Arg 

3.28 

Gys 

2.74 

backbone 

2.25 


^‘Hse - Neutral histidine, proton on NE2 atom. Hsd - Neutral histidine, proton on NDl atom. 
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